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[57] ABSTRACT 

Automated capture of an uttered alphabetic character is 
provided by using an input, beyond the uttered alphabetic 
character, to disambiguate an incorrectly captured character. 
The input is an indication of a telephone key representing the 
uttered alphabetic character. The indication can be a dual 
tone multifrequency signal or an utterance of the number of 
the telephone key. Alternatively, the input is an indication 
that the incorrectly captured alphabetic character differs 
from the uttered alphabetic character. 

5 Claims, 3 Drawing Sheets 
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DISAMBIGUATION OF ALPHABETIC 
CHARACTERS IN AN AUTOMATED CALL 
PROCESSING ENVIRONMENT 

BACKGROUND OF THE INVENTION 

The present invention relates to automated call 
processing, and, more particularly, is directed to capturing 
alphabetic characters in an automated call processing envi- 
ronment. 

Automated call processing has achieved widespread 
usage. Applications include call routing, voice mail, direc- 
tory assistance, order processing, information dissemination 
and so forth. 

However, existing telephone based services in which a 
caller is interacting with a computer do not capture alpha- 
betic character strings with a high degree of accuracy when 
the strings comprise letters which are selected from an 
unlimited or very large domain, such as names. Since the set 
of character strings cannot be defined in advance, the string 
must be spelled as it is captured. 

Automatically capturing alphabetic spelled character 
strings using only voice input is not feasible presently 
because letter recognition accuracy is too low with available 
voice recognition technology. For example, it is difficult to 
automatically distinguish "B" from "P". 

Methods of automatically capturing alphabetic spelled 
character strings using only dual tone multifrequency 
(DTMF) input from a twelve-key keypad on a telephone set 
are cumbersome, as each telephone key does not uniquely 
map to a single alphabetic character. Consequently, multiple 
inputs per letter are required for disambiguation, e.g., to 
indicate "K" press "5" twice or press "5", "2". These 
methods are also error-prone due to the problem of the user 
accidentally pressing the wrong key or multiple keys and 
being unaware of the error, the so-called "fat finger" effect. 

SUMMARY OF THE INVENTION 

Automated capture of an uttered alphabetic character is 
provided in accordance with the principles of this invention 
by using an input, beyond the uttered alphabetic character, 
to disambiguate an incorrectly captured character. 

In an exemplary embodiment of this invention, at least 
one uttered alphabetic character is captured by receiving a 
signal indicative of the uttered alphabetic character, auto- 
matically finding a first candidate alphabetic character cor- 
responding to the received signal, inquiring whether the first 
candidate alphabetic character is the uttered alphabetic 
character, and receiving an input for use in disambiguating 
the received signal when the first candidate alphabetic 
character differs from the uttered alphabetic character. 

The input is an indication of a telephone key representing 
the uttered alphabetic character. The indication can be a dual 
tone multifrequency signal or an utterance of the number of 
the telephone key. Alternatively, the input is an indication 
that the first candidate alphabetic character differs from the 
uttered alphabetic character. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram illustrating a configuration in 
which the present invention is applied; 

FIG. 2 is a flowchart of a method of automatically 
capturing an uttered alphabetic character; and 

FIG. 3 is a flowchart of another method of automatically 
capturing an uttered alphabetic character. 
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DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

The present invention is related to the invention of U.S. 
patent application Ser. No. 08/580,702, filed Dec. 29, 1995, 
the disclosure of which is hereby incorporated by reference. 

In an automated call processing scenario, for example, a 
caller, also referred to herein as a user of the automated call 
processing system, is assumed to have decided that he or she 

3 0 wishes to enter their name or other alphabetic information to 
the system, for a purpose such as placing an order or 
receiving information. In this scenario, the user has available 
only a conventional telephone set, i.e., any telephone set 
unable to directly transmit alphabetic information across the 

15 telephone network, and communicates via this telephone set 
with the system. 

Referring now to the drawings, and in particular to FIG. 
1, there is illustrated a system 900 in which the present 
invention is applied. As mentioned, a user is assumed to 

20 have access to only a conventional telephone set 910 which 
communicates with the system 900 using conventional tele- 
communications facilities such as wired or wireless tele- 
communications systems known to one of ordinary skill in 
the art. 

25 The system 900 comprises communications interface 
(COMM INTFC) 920, speech generation module (SPEECH 
GEN) 930, speech recognition module (SPEECH RECOG) 
940, storage interface (STORAGE INTFC) 950, storage 

* medium 960, memory 970, processor 980 and communica- 

30 tions links therebetween. 

Communications interface 920 is adapted to receive calls 
from a user telephone set 910, to supply synthesized speech 
from speech generation module 930 to the telephone set 910, 
to forward signals from the telephone set 910 to speech 

35 recognition module 940, and to exchange information with 
processor 980. The system shown in FIG, 1 includes a 
communications bus and separate communications lines for 
carrying voiceband signals between the communications 
interface 920 and each of speech generation module 930 and 

40 speech recognition module 940, but one of ordinary skill in 
the art will appreciate that other configurations are also 
suitable. 

Speech generation module 930 is adapted to receive 

45 control commands from processor 980, to generate a voi- 
ceband signal in response thereto, and to deliver the gener- 
ated signal to communications interface 920. Preferably, 
speech generation module 930 generates synthesized speech 
in a frequency band of approximately 300-3,300 Hz. In 

5Q some embodiments, speech generation module 930 may also 
function to transmit ("play") pre-stored phrases in response 
to commands from processor 980; module 930 includes 
appropriate signal storage facilities in these cases. 
Speech recognition module 940 is adapted (i) to receive 

55 from communications interface 920 a voiceband signal 
which can be a speech signal or a DTMF signal generated in 
response to depression of a key on the telephone set 910, (ii) 
to process this signal as described in detail below and in 
response to commands from processor 980, and (iii) to 

60 deliver the results of its processing to processor 980. As will 
be appreciated, in some embodiments speech recognition 
module 940 includes storage for holding predetermined 
signals and/or for holding speech signals from telephone set 
910 for the duration of a call. 

65 Storage interface 950 is adapted to deliver information to 
and retrieve information from storage medium 960 in accor- 
dance with commands from processor 980. The storage 
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medium 960 may be any appropriate medium, such as automated capture is inaccurate, to compensate for the 

magnetic disk, optical disk, tape or transistor arrays. inadequacy in voice recognition technology. Therefore, 

Memory 970 may be implemented by using, for example, voice recognition technology which has imperfect letter 

ROM and RAM, and is adapted to store information used by recognition accuracy may now be utilized to provide highly 

processor 980. 5 accurate character capture. 

Processor 980 is adapted to execute programs for inter- At step 110 of FIG. 2, the system prompts the user to utter 

acting with the user of telephone set 910 in accordance with an alphabetic character. For example, a typical system 

a control program typically stored on storage medium 960 prompt may be, "Please spell your name, beginning with the 

and also loaded into memory 970. Processor 980 may also first letter." After the first character has been correctly 

communicate with other systems via communications links 10 captured, the system prompt may change to, "Please say the 

(not shown), for example, to retrieve user-specific informa- next character, or the word "DONE" to go on", or to, "Please 

tion from a remote database and/or to deliver information say the next character, or press the pound sign to go on." 

captured from the user of telephone set 910 to a remote The user responds by uttering an alphabetic character, 

database. such as "N". At step 120, the system receives a signal 

In a typical call processing operation, the user employs 15 indicative of the uttered alphabetic character. In this 

telephone set 910 to place a call to system 900. Communi- example, the signal represents the utterance "en", 

cations interface 920 receives the call and notifies processor At step 130, the system accesses a set of stored signals 

980 of an in -coming call event. Processor 980, in accordance representing spoken alphabetic characters. The set generally 

with its control program, instructs speech generation module comprises signals representing the utterances "ay", "bee", 

930 to generate a speech signal. Speech generation module 20 "see", "dee", and so on. Some alphabetic characters may 

930 generates the requested speech signal and delivers the have multiple stored signals, such as "zee" and "zed" for 

generated signal to communication interface 920, which "Z". The system compares the received signal with the 

forwards it to the telephone set 910. stored signals, selects the stored signal which best matches 

In response to the generated speech signal, the user enters the received signal, and finds the alphabetic character cor- 

infonnation to system 900 via telephone set 910. As 25 responding to the best matching stored signal. In this 

described in detail below, the information can be a speech example, the system is assumed to select the stored signal 

signal or a DTMF signal generated in response to depression for "em" as the best matching signal, 

of a key on the telephone set 910. At step 140, the system inquires whether the alphabetic 

Communications interface 920 (a) receives the user- 3Q character corresponding to the best matching stored signal, 

generated signal; (b) notifies processor 980 that a signal has that is, the first candidate character, is the uttered character, 

been received; and (c) delivers the signal to speech recog- The inquiry is generated using speech generation technology 

nition module 940. The module 940 processes the signal in known to one of ordinary skill in the art. Preferably, the 

accordance with the present invention, as described in detail system inquiry includes information for assuring that the 

below, and delivers the result of its processing to processor 35 user correctly understands the alphabetic character pre- 

980. Based on this result, processor 980 proceeds through its sented by the system. For example, the system may inquire, 

control program, generally instructing speech generation "I understood M as in Mary. Is this correct?" In this 

module 930 to request information from the user or to example, the additional information is a word "Mary" asso- 

deliver information to the user, and receiving processed user ciated with the best matching alphabetic character "M", 

input from speech recognition module 940. 4Q where the spelling of the word begins with the best matching 

Entry of alphabetic information according to the present alphabetic character. The user replies with, typically, a "yes" 
invention, from the user to the system, will now be or "no" answer, which can be processed by presently avail- 
described. a hle voice recognition technology with a relatively high 

FIG. 2 illustrates a flowchart for a method of automati- level of accuracy. At step 150, the system receives the user's 

cally capturing an uttered alphabetic character. The character 45 reply. 

capture method illustrated in FIG. 2 generally involves the At step 160, the system uses the reply to determine 

user uttering a character, and the system presenting what it whether the alphabetic character corresponding to the best 

has determined as the first candidate character to the user. If matching stored signal is the uttered character. If the char- 

the first candidate character is correct, that is, the first acter selected by the system matches the uttered character, 

candidate character is the character uttered by the user, then 50 then the system has correctly captured an alphabetic char- 

the system goes on to capture the next character. If the first acter and goes on to ask for the next character at step 110. 

candidate character is incorrect, then the system asks for an ; If the character selected by the system does not match.the 
input to aid in disambiguating the uttered character.^The> uttered character, then, at step 170* the system prompts the 

input is preferably a DTMF signal for a telephone key. The user to enter an input for correctly disambiguating the 

DTMF signal input narrows the range of possible characters, 55 received signal as the desired character, 

and in combination with the uttered character, typically Preferably, the user has a telephone set which provides 

results in a correctly identified character. DTMF, and so the system prompt is, "Please touch the 

The flowchart illustrated in FIG. 2 encompasses the telephone key having the desired character." Provision is 

actions of the elements shown in FIG. 1. For example, a made for characters which do not correspond to a telephone 

control and processing program executed by processor 980 60 key, namely, "Q" and "Z", such as assigning them to the "0" 

will be apparent to one of ordinary skill in the art in view of key. 

FIG. 2. Alternatively, , the user may be prompted to speak the 
An advantage of the present method is that the user utters_ number of the telephone key having the desired character, 
only the desired character, and not additional information,^ This alternative is useful when the caller has a pulse tele- 

when the speech recognition portion of the automated sys- 65 phone set, i.e., does not have the capability to enter DTMF. 

tern is capable of correctly capturing the character In other Presently available voice recognition technology has a high 

words, additional user input is required only when the level of accuracy for recognition of single digits, as com- 
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pared with the level of accuracy for recognition of spoken presenting what it has determined as the best matching 

alphabetic characters. character to the user. If the presented character is correct, 

At step 180, the system receives the input entered by the that is, the presented character is the character uttered by the 

user for disambiguating the uttered alphabetic character. For user, then the system goes on to capture the next character, 

example, the input may be the DTMF generated by depres- 5 If the presented character is incorrect, then the system 

sion of the "6" telephone key, corresponding to the letters presents its next best matching character to the user and 

"M", "N", "O". inquires whether this character is correct. In this method, the 

In another embodiment, if the character selected by the responses of the user are inputs to aid in disambiguating the 

system does not match the uttered character, then, aL&tep uttered alphabetic character. 

170, the system prompts the user to speak a word beginnin g 10 An advantage of this method is that the user has a very 

w ith "the uttered cnarac ter. ro r example, a system promp t simple structured interaction with the system, that is, the 

may be, "Flease say Nancy if the character is IN or Mary i f user either accepts or rejects the characters presented by the 

th e character is M." Then, at step 180, th e system receiv es system. This method also permits voice recognition tech- 

t he input. In this embodiment, the system is changed t o nology which has imperfect letter recognition accuracy to be 

e xpect an utterance corresponding to one ot, e.g.^ Nancy" 15 utilized to provide highly accurate character capture, 

or "Mary". I nat is, tor the input used tor disambiguation, the g 41Q and m of FIG 3 are $imilar tQ m and 

ex pectations of the speech recojmjzej-rj&arding the nature of m q{ fig ^ and> for breyi wm not be discussed in 

t he input change relative to the nature of the input expe cted detail 

initially. . At step 430, the system accesses a set of stored signals 

At step 190, the system uses the input to select a subset 20 \ J 0 

. 3 . v ,t_i_ t . . representing spoken alphabetic characters, as described 

of the stored signak represent^ spoken alphabetic charac- ^ * ect (0 £ 130 of pjQ 2 . Preferably, the 

! ers - «m- e ^ p «n- 15 s y stem selects ' he stored si S nals which match the received 

letters M , N , u . . , signal to within a predetermined threshold as the best 

At step 200, the system eliminates the stored signals for matching ^ tnen orders the stored signals 

the first candidate character, which has already been pre- « similarj to the TeceWed si , to ate an ordered 

sented to the user, from the subset, if the first candidate best matc hj n g subse t 

character is in the subset. In this example, the system ... , , . 

eliminates the stored signal(s) for the letter "M". AlternaUvely, the system compares the reeved signal 

<*+ , • 4l _ . . , with the stored signals, selects the stored signal which best 

• , ^ l 210 V the u s y stem , orders * e rel ™g st ° r f matches the rece ? ed si al> flnds the a i phab etic character 

signals in the subset by similarity to the received signal. In c ndi t0 me be % maIchin stored si al> and 

this example, the received signa is en the remaining ^ ^ look for me al habetic character to obtain 

stored signals m the subset a re en ^ oh , and the an ordered best matching subset . 
ordered subset is { en , oh }. 

At step 220, the system selects the best matching signal in ^ Jf . ano 1 ther . u alt " raative ' ^ s f ,cm 1 ""T^ th * 

.1 t a l.* *l. a a- a ♦ u * m 1 35 received signal with the stored signals, selects the stored 

the stored subset as the second candidate character, namely, . . , . 7 . . . ■.■ . J£J . 

"en", and at step 230, the system innuires whether the signal which best matches the received signal and finds the 

seco nd candidate alphabelicdiara^ D^^ alphabetic character corresponding to the best matching 

tef — c stored signal. In this alternative, a best matching subset is 

— 7 j a . , . . generated only when the best matching alphabetic character 

At step 240, the s ystem receiv es the user s reply . r . t . , J t , 01 

r-. 1 1 — ■ — ^7 40 is rejected by the user. 

At step 250, the system uses the reply to determine . „ 

whether the second candidate alphabetic character is the Steps 440^60 of FIG. 3 are similar to steps 140-160 of 

uttered character. If the second candidate character selected 2 ' and ' for brevit y> Wl11 not be discussed in detaiL 

by the system matches the uttered character, then the system If thc best matching character selected by the system does 

has correctly captured an alphabetic character and goes on to not matcn ttic uttered character, then, at step 470, the system 

ask for the next character at step 110. determines whether anything is left in the best matching 

If the second candidate character selected by the system subset - If something is left in the subset, then at step 480, the 

does not match the uttered character, then, at step 260, the svstem selects the next entr y m the subset 35 the best 

system eliminates the just refused second candidate charac- matching alphabetic character, and loops back to step 440 to 

ter from the subset, and determines whether anything is left 50 check whether the user acce P te this new best matching 

in the subset. If something is left, then the system goes to alphabetic character. 

step 220 and tries the best remaining character. In this It will be appreciated that if a best matching subset has not 

example, if the user refused "en", then "oh" would still yet been determined, then at step 470, it is necessary to 

remain in the subset, and would be presented to the user. determine the best matching subset, and eliminate the just 

If nothing is left in the subset, then the system has been 5S rejected character from the best matching subset, 

unable to correctly capture the uttered character. At step 270, If nothing is left in the subset, then the system has been 

the system prompts the user to re-utter the character, and unable to correctly capture the uttered character. At step 490, 

returns to step 120 to re-try capturing the character. If this is the system prompts the user to re -utter the character, and 

the situation, then at step 130, the system eliminates the returns to step 420 to re-try capturing the character. If this is 

signals corresponding to the already refused characters from $ 0 tne situation, then at step 430, the system eliminates the 

the stored signals when selecting the best matching stored signals corresponding to the already refused characters from 

signal. the stored signals when selecting the best matching stored 

Referring now to FIG. 3, there is illustrated a flowchart for signal, 

another method of automatically capturing an uttered alpha- Although illustrative embodiments of the present 

betic character. 65 invention, and various modifications thereof, have been 

The character capture method illustrated in FIG. 3 gen- described in detail herein with reference to the accompany- 

e rally involves the user uttering a character, and the system ing drawings, it is to be understood that the invention is not 
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limited to these precise embodiments and the described 
modifications, and that various changes and further modi- 
fications may be effected therein by one skilled in the art 
without departing from the scope or spirit of the invention as 
defined in the appended claims. 5 
What is claimed is: 

1. A method of capturing at least one uttered alphabetic 
character, comprising the steps of: 

receiving a signal indicative of an uttered alphabetic 
character; 

accessing a set of stored signals representing spoken 
alphabetic characters; 

automatically finding a first candidate alphabetic charac- 
ter from the set of stored signals corresponding to the 35 
received signal; 

inquiring from a user whether the first candidate alpha- 
betic character is the uttered alphabetic character; and 

receiving an input for use in disambiguating the received 
signal when the first candidate alphabetic character 20 
differs from the uttered alphabetic character, wherein 
the input is an indication of a telephone key represent- 
ing the uttered alphabetic character and wherein the 
telephone key indication is an utterance; and 

disambiguating the received signal, by automatically find- 25 
ing an alternate candidate alphabetic character that 
accurately captures the uttered character; wherein the 
step of automatically finding the alternate candidate 
alphabetic character includes using the input to com- 
pare the stored signals with the received signal until 30 
one of the stored signals matches the received signal. 

2. A method of accurately capturing at least one uttered 
alphabetic character, comprising the steps of: 

receiving a signal indicative of an uttered alphabetic 
character; 

accessing a set of stored signals representing spoken 
alphabetic characters; 

automatically finding a first candidate alphabetic charac- 
ter corresponding to the received signal; 40 

inquiring from a user whether the first candidate alpha- 
betic character is the uttered alphabetic character; 

receiving an input for use in disambiguating the received 
signal when the first candidate alphabetic character 
differs from the uttered alphabetic character; 45 

automatically finding a second candidate alphabetic char- 
acter corresponding to the received signal in accor- 
dance with the input, wherein the step of automatically 
finding the second candidate alphabetic character 
includes comparing the stored signals with the received 50 
signal; eliminating a stored signal representing the first 
candidate alphabetic character from the stored signals 
and inquiring whether the second candidate alphabetic 
candidate character accurately captures the uttered 
character; wherein 

automatically finding an alternative candidate alphabetic 
character; wherein the step of automatically finding 
includes using the input to compare a selected group of 
the stored signals representing the spoken alphabetic 6Q 
characters indicated by the input with the received 
signal until one of the stored signals matches the 
received signal. 
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3. Apparatus for capturing at least one uttered alphabetic 
character, comprising: 

means for receiving a signal indicative of an uttered 
alphabetic character; 

means for a set of stored signals representing spoken 
alphabetic characters; 

means for automatically finding a first candidate alpha- 
betic character corresponding to the received signal; 

means for inquiring from a user whether the first candi- 
date alphabetic character is the uttered alphabetic char- 
acter; 

means for receiving an input for use in disambiguating the 
received signal when the first candidate alphabetic 
character differs from the uttered alphabetic character, 
wherein the input is an indication of a telephone key 
representing the uttered alphabetic character and 
wherein the telephone key indication is an utterance; 
disambiguating the received signal, by automatically 
finding an alternate candidate alphabetic character that 
accurately captures the uttered character; wherein the 
step of automatically finding the alternate candidate 
alphabetic character includes using the input to com- 
pare a selected group of the stored signals with the 
received signal until one of the stored signals matches 
the received signals. 

4. The apparatus of claim 3, wherein the utterance rep- 
resents a number corresponding to the telephone key. 

5. Apparatus for capturing at least one uttered alphabetic 
character, comprising: 

means for receiving a signal indicative of an uttered 
alphabetic character; 

means for automatically finding a first candidate alpha- 
betic character corresponding to the received signal; 

means for accessing a set of stored signals representing 
spoken alphabetic characters; 

means for inquiring from a user whether the first candi- 
date alphabetic character is the uttered alphabetic char- 
acter; 

means for receiving an input for use in disambiguating the 
received signal when the first candidate alphabetic 
character differs from the uttered alphabetic character; 
and 

means for automatically finding a second candidate alpha- 
betic character corresponding to the received signal in 
accordance with the input; wherein the means for 
automatically finding the second candidate alphabetic 
character includes means for comparing the stored 
signals with the received signal and wherein a stored 
signal representing the first candidate alphabetic char- 
acter is eliminated from the stored signals indicated by 
the input; disambiguating the received signal, by auto- 
matically finding an alternate candidate alphabetic 
character that accurately captures the uttered character; 
wherein the step of automatically finding the alternate 
candidate alphabetic character includes the step of 
using the input to compare a selected group of the 
stored signals representing the first spoken alphabetic 
characters indicated by the input with the, received 
signal until one of the stored signals matches the 
received signal. 
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